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Abstract 

Priority inversion occurs when a process is delayed by the actions 
of another process with less priority. With atomic transactions, the 
concurrency control mechanism can cause delays, and without taking 
priorities into account can be a source of priority inversion. In this 
paper, three traditional concurrency control algorithms are extended 
so that they are free from unbounded priority inversion. 

Keywords: Priority inversion, concurrency control, real-time da- 
tabases. 

In a real-time system, the actions of some process may be more urgent than 
those of another. For example, the first process may need to synchronize 
with a physical process and sp must must a deadline. If both processes have 
access to common resources that cannot be shared, the less urgent process 
may delay the more urgent one by holding onto the resource. This situation 

^Submitted to the 10th Real-Time Systems Symposium , Los Angeles, December 1989. 
iThis work was supported by the Defense Advanced Research Projects Agency (DoD) 
under ARPA order 6037, Contract N00140-87-C-8904 The views, opinions, and findings 
contained in this report are those of the authors and should not be construed as an official 
Department of Defense position, policy, or decision. 
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algorithms in the conclusions of the paper. 

In section 1, we describe the properties a concurrency control mechanism 
must have if it is to support transactions with priorities. In section 2 we 
develop a general concurrency control mechanism based on serialization 
graph testing algorithms that detects priority inversions. While easy to 
understand, such algorithms are complex to implement since a directed 
graph must be maintained and updated with each operation submitted to 
the scheduler. 

There are two popular concurrency control mechanisms where the sched- 
uler use a much simpler data structure at a cost of reduced concurrency. 
One ( two-phase locking) delays operations to ensure serializability while 
the other ( timestamp order ) aborts operations to ensure serializability. In 
section 3 we show the typical extension of two-phase locking does prevent 
priority inversion when the priorities are drawn from a connected order. In 
section 4 we develop a timestamp order mechanism that detects priority 
inversion. 

In this paper, we follow the notation and system model found in [2]. 


1 Concurrency Control 

Suppose we have a set of processes submitting operations under transac- 
tions to a database scheduler. Each process can submit an unspecified 
number of transactions. 

There exists a partial order >- of priorities over the transactions, where 
Pi y P 2 means process 1 has priority over process 2. A transaction T, 
submitted by p, has the same priority as p,, so we can also write expressions 
like T\ y 7V The database scheduler knows y but has no other information 
about the transactions any process will submit. A transaction’s priority is 
static; it cannot be changed by the scheduler or the process submitting the 
transaction. 

Our goal is to devise a concurrency control algorithm that: 

1. ensures the resulting execution is serializable, and 
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A more dramatic delay is a cascaded abort. Using the above example since 
Ti has read x written by T„ if 7j decides to abort, then T 2 mus’t also 
abort Again, the scheduler can prevent this condition by deCng some 
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Figure 1: Reads-From Graph 

Theorem 1 If the RFG of a set of transactions contains a priority inversion 
cycle, a priority inversion can occur. 

Proof: Suppose we have a RFG that contains such a a cycle. Let the two 
transactions with the priority edge between them be T, to Tj such that 
T, >- Tj. By the definition of a RFG , Tj is active. If T,- wishes to commit, it 
must delay until T — j commits; otherwise, the resulting execution would 
not be recoverable. Additionally, if Tj aborts 7 1 , must (transitively) abort. 
Both cases represent a priority inversion. □ 

A purely conservative scheduler is a scheduler that never rejects an opera- 
tion (thereby aborting the transaction submitting the rejected operation); 
it only delays operations until it is safe to execute them. Theorem 1 implies 
that there are no purely conservative schedulers that avoid priority inver- 
sion. Suppose such a scheduler existed, and it were submitted the operation 
WjX where p, >- pj. By theorem 1, if T, were to submit the operation r,x, 
it would introduce the possibility of a priority inversion. So, the scheduler 
must delay the write operation until it knows that T; will not submit a r*x 
before Tj commits. Since the nature of the transactions submitted by p< 
are unknown to the scheduler, it must delay WjX forever. 
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SGT schedulers are more of theoretical than practical interest. They are 
easy to understand and argue correct, but the overhead of maintaining a 
serialization graph may not justify any increase in concurrency over other 
schedulers. In this section, a SGT scheduler will be extended to avoid prior- 
ity inversions. This extension increases the complexity of the scheduler. In 
particular, much of the simplicity of SGT schedulers comes from aborting 
a transaction only when it submits an operation. As noted in section 1, 
this policy cannot be used when avoiding priority inversion. 

A SGT scheduler operates as follows. When a transaction T, submits an 
operation p,x, the scheduler tentatively adds conflict edges from all vertices 
Tj to T, if there exists an operation qjx executed earlier that conflicts with 
PiX. If p,x creates a cycle in the serialization graph, the scheduler aborts 
T,, since the resulting execution would not be serializable. Once aborted, 
Ti is removed from the graph along with all edges either into or out of T,. 
If piX does not create a cycle, the tentative edges can be made permanent 
and the operation executed. 

To ensure the executed instructions are recoverable, the scheduler delays 
the commit from T, until all transactions from which Ti read have also com- 
mitted. Once Ti has committed, T, can be removed from the serialization 
graph when it cannot be involved in any future cycles. Since all operations 
after T.’s commit will be ordered after T,, any new edges will be added lead- 
ing out of Ti. This means T t can be removed when there are no edges in the 
graph leading into T,. We will assume such transactions are automatically 
removed. 

A priority serialization graph testing scheduler (or PSGT scheduler) follows 
a similar strategy, with the caveats outlined in section 1. In particular, the 
rejection strategy of SGT can cause a priority inversion. Instead of aborting 
the transaction that submitted the operation, we may have to abort a 
transaction with less priority. By generating priority commit histories, we 
will always be able to abort such transactions. 

However, this strategy complicates the scheduler. If the submitted oper- 
ation is a write, it could conflict with several unordered reads. Each new 
conflict can create a distinct cycle in the serialization graph. With SGT, 
all cycles are avoided by rejecting the new operation; with PSGT , we may 
have to abort a different transaction from each cycle. 
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Additionally, the PSGT scheduler will need to avoid • 

caused by cascaded aborts. The scheduler can do so by maintaining 1 ^ 
RFG and checking for priority inversion cycles. Maintenance of a RFC 
is not as straightforward as a serialization graph When a transart ' 
aborted, the rxads-from relation changes whfch^Jn ly IZZZZ 

rtrX 11 ^ F ° r eXamPle ’ C ° nSider thC f ° Uowin S ^tory where 
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l He °“! y . P riorit y Aversion cycle is (T 0 , T 3 ). Once T 3 is aborted, the cycle 

0. 2 ) created, and when T 2 is aborted the cycle (T 0 , T x ) is created 

One way simplify detecting and removing priority inversion cvrles is t 

augment the RFG. An augmented RFG will contain a vertex for each active 
transaction, and three kinds of edges: 

1. Priority edges, as in a RFG. 

2. Read-from edges, as in a RFG, except that the edge is labeled with 
the name of the variable that was read. 

3. Write-after edges, also labeled with the name of a variable. When 
a transaction 7* writes a variable x, a write-after edge labeled x is 
drawn from the last transaction that wrote x (if it is still active) to 

When a read-from edge is added to the augmented RFG, the graph can 
be traversed to determine which transactions should be aborted T ,, it 

toThe° n efd b0 f t<7 '’ ' , k| , ’ ) ^ lh ' *** ° f transac ‘ 10 “ s ‘bat must be aborted due 

^ u^tmdTe ” by r; P “ * he ‘be transaction 

hat submitted the onginal read operation. The functions read(T 1 ) and 

wnte(r, x) encode the reads-from and write-after edges; i.e. they are“e 
transaction from which T read x and wrote x after, ipeitivelf Abor is 
recursively defined as follows. y- 
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Abort(T, v, p) = 

if p >- T — ► {T} U Abort(write(T, v), v, p) 

□ p )f- T — ►V variables w read by T: 

IL Abort(read(T, w), w, p) 

fi 

Figure 2 shows an example, where write-from edges are drawn as doubled 
arrows. When T\ submits r^x, the function Abort(T 2 , x, Ti) is evaluated, 
yielding {T 3 , T 4 }. T 2 will also be aborted as a cascaded abort. 



Figure 2: Abort(T 2 , x, 7\) = {T 3 , T 4 } 

A PSGT scheduler executes as follows. Let Ti be a transaction that has 
submitted an operation p,x to the scheduler. 

• If pi is a read or write operation: 

1. Add the operation to the serialization graph as described above. 
Let C be the set of cycles created by adding the new edges. If 
\C\ = 0, skip to step 3. 
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If Ti can be aborted without introducing a priority inversion; 
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• If p, is a commit operation, the scheduler must ensure the history is 
pnonty committed. The commit operation is delayed until all tram 
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committed or of less or incomparable priority. 
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the scheduler should examine the cycles in order of ascending length The 

scheduler accumulates a list of transactions A to abort; if when exaLninv* 

cycle c, it is found that ADc ^ 0, the scheduler need not select a transaction 

from c ‘ by X‘p°on ' Xf C “ choose transaction 

Priority h,stones ' ~ ^ 
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RFG must be updated when transactions from C are aborted and 

ari-saction must be able to find the value of a variable after a evaded 
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3 Preemptive Two— Phase Locking 


If we assume >- is connected (i.e. all processes have comparable priorities), 
two-phase locking ([3], [2]) can be easily extended to detect and eliminate 
priority inversion. Basic strict two-phase locking uses the following rules: 

1. A transaction T, acquires a lock on a data item before referencing the 
item. These locks are typically read or write locks (also called share 
and exclusive locks) depending on the submitted operation. T, delays 
until the required lock is available. 

2. All locks held by T, are released after Tj commits. 

In order to avoid priority inversion, a preemptive version of two-phase lock- 
ing ( P2PL ) cn be used. When T, tries to acquire a lock, it waits until either 
the lock is free or all processes holding the lock with conflicting access have 
less priority. In the latter case, the scheduler then aborts the transactions 
holding the lock and gives it to Tj. Since all committed transactions follow 
the original two-phase rules, P2PL generates serializable histories. Ad- 
ditionally, while 2PL is susceptible to deadlock, P2PL limits deadlock to 
occur only among transaction with the same priority. If a set of deadlocked 
processes have different priorities, there must exist a priority inversion, and 
P2PL will detect it and remove it. 

P2PL does not have cascaded aborts, so it cannot generate priority inversion 
cycles in the RFG. A transaction Tj reads from another transaction Tj only 
after Tj commits, and only active transactions are in the RFG , so the RFG 
will contain no reads-from edges. 

P2PL generates priority committed histories without additional delays at 
commit. If Tj is ordered before T„ either there exists two conflicting 
operations piX < qjX or there exists a transaction T* such that Tj is or- 
dered before T* and T* is ordered before Tj. For strict two-phase locking, 
(piX < qji) =» (cj < Cj), and since the commits form a total order, if 
(Tj ordered before Tj) =*► (cj < Cj). This simplicity comes at a cost, how- 
ever. For example, consider the submitted history W 2 X\ W\X\ c?; c\ where 
T\ Ti- Under PSGT, the commit from T* is delayed until after the 
commit of Tj; under P2PL , Ti is aborted by the write from T\. 


11 


ctLTed^e "“elvTrT “* ^ ** — 

wuers. net li,l 2 ,r 3 have pnontxes To >- To and Ut T o™ • 
an exclusive lock on r and T , 3» u let 1 3 acquire 

aive iock on x and T x acquire an exclusive lock on v T fT a**.,™ * 

to acquire the lock on x it will block since T * T Tf T ft! * attem P ts 

acquire the lock on y it too will block since T )£ T* w * T & !! mpts to 
tively blocked on T w k; k • iz * r i- We now have T 2 transi- 

* 1 3 ’ which is a priority inversion Extending P9PT t i 

o m wtr„ r ^ iSMr — *■ 

ad locks held by processes transitively blocking the request. 

4 Priority Timestamp Order 

Z2 TheyZVnf!! ?" * ' 

operations respect’ any older”! asdgnrf byThe ^amp^ra!™^ '''' 
Associated with each variable x in thp data Kao * , 

r* rs i -“- Thrae omt r^L a 

^ r< wih p - :: 

ab^ ^ A re ^f °P erat ^ on: rf Si < x.w then the read is too late and T is 
aborted; otherwise, x.r is set to s, and the read is executed 

2 ‘ klV S 17 nt + r Peratl0n: if * < X ’ r then this w rite is too late and T 
^aborted; otherwise, the write is executed if s. > and x.w is sei 

3 ' T haTrLd > r iit h 0Perali00 ’ “ ^ delayed unta »" ‘factions that 
L p^t^]) C ° mim “ ed ' Ttere “ $eVeral Wa> ' S *° achieve 

*;™rr P rr™* COntro1 «»* detects priority inversion 

(PT°) allocates tunestamps such that priority inversion cycles in theTS 


12 



cannot occur. A timestamp s, for T, is uniquely allocated from a total order 
such that it meets the following two conditions: 

1 . For all committed transactions 7k, s, > s*. 

2. For all active transactions Ty. if Tj X T, then s, > Sj and if T, X Tj 
then Sj > Si. 

The first condition is the same as for typical TO schedulers: to do otherwise 
implies the later transaction must appear to have run before a committed 
transaction. The second condition guarantees that the RFG will contain 
no priority inversion cycles: a reads from edge cannot go from a transaction 
with less priority to one with more priority. Since the timestamps have a 
total order, there can be no reads from path from a transaction with less 
priority to one with more priority. 

It is not difficult to generate timestamps that obey the above two conditions. 
If a timestamp is represented as a number, the number space must be dense. 
Consider transactions Tj with timestamp s } and T, X Tj with timestamp 
a, < Sj. For any n, if n new transactions start with priority between T t and 
Tj, n timestamps with values s, < s < Sj must be assigned. In practice this 
shouldn’t be a real problem, and in extreme cases the scheduler can abort 
T 

j.j. 

PTO must use a different comparison rule than TO. With TO, a transaction 
is aborted if it submits its operation too late: that is, it has too low a 
timestamp. Under PTO the transaction with more priority could be the 
one that is late, so the transaction that acted too early should be aborted. 
Like PSGT , there can be several such transactions that acted too early. For 
example, consider the history w^x; r 3 x; W\X where T\ X T 2 X T 3 . The first 
two operations happened too soon, and T% and T 3 are aborted. Instead of 
associating a single read and write timestamp with a variable, a list of read 
timestamps and write timestamps must be kept. For recoverability, each 
list must contain at least one timestamp from a committed transaction. 
This lists can grow arbitrarily long, but in practice this shouldn’t be a real 
problem. A timestamp can be removed from a list if the list contains a larger 
timestamp of a committed transaction. In extreme cases, the scheduler can 
abort the active transaction with the largest timestamp; e.g. transaction 
Tj in the example above. 
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2. If it is a write operation: s, is entered into x.w along with the value 
betng wntten. Let s be the stnaUest timestamp in L greater In 

„ or oo if 3, is the largest timestamp. AH transactions T k mxr that 
have timestamps in the range * < s* < s are aborted (there may be 
no such transactions), as they read x too early. 

3 ' drnJf a C0 ? mi * 0perati L 0n ’ * is dela y ed «“til all transactions with 
imes tamps less than s, have committed. Once the transaction suc- 
cessfully commits, for each variable * in T,’s read (c/. write) set the 

imestamp lists x.r (c/. x.w) can be truncated: all timestamps’ less 
than Si can be removed. p iess 

w hen a transaction T } is aborted, its timestamps are removed from all 

Z a H e HstS ' Additiona]J y. transactions that read from T must 

also be aborted. For each variable x in T/s write set, let , be thTsmaZ 

~ ^ 

aborted. J 5 read Tj, so they are 

PTO ensures serialixability by using timestamps from a total order and 

~ion e ^r T T Pri0rity i T rSiOUS f ° r by its times’tanip 

generation rule. The main weakness with PTO is the delay in the commit 

^e e same P rime a I"" 8 ”*? 0nly to u P da ‘e * but is started at 

he same time a long-running transaction with more priority is active 

Even if the two transactions never reference the same variables the shorter 

both*P2PL TncTpSC^th 16 l T g t X FUnning transaction complete. With 
2FL and PSGT ’ the shorter transaction wiU be able to complete 
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without delay. For PTO to do similarly, it must either maintain the actual 
reads from relation as PSGT does, or know more information about the 
transactions (such as a transaction’s read set and write set). 


5 Discussion 

This paper examined three common concurrency control algorithms and 
showed how each could be extended to avoid priority inversion. The results 
are mixed: 

• Without some knowledge of the transactions that will be submitted, 
there are no purely conservative concurrency control schedulers nor 
any practical purely aggressive concurrency control schedulers that 
avoid priority inversion. 

• Traditional aggressive schedulers, like serialization graph testing and 
timestamp order schedulers abort a transaction by rejecting an op- 
eration when submitted. This method cannot be used when priority 
inversion must be avoided. Instead, a transaction that submitted its 
operation earler must be aborted, so the more urgent transaction can 
continue. This policy increases the complexity of aggressive sched- 
ulers. In the case of serialization graph testing, it isn’t clear that 
the increased concurrency would ever compensate for the increased 
complexity, given a reasonable workload. 

• The traditional conservative scheduler, two phase locking, can be eas- 
ily extended to avoid priority inversion when the priority relation is 
connected. The extension for nonconnected priorities is somewhat 
more complex. 

• Timestamp order schedulers, when extended to avoid priority inver- 
sion, suggest using a multiversion concurrency control algorithm. The 
extended algorithm is not much more complex than a traditional mul- 
tiversion timestamp order algorithm. However, transactions with less 
priority can be needlessly delayed unless read sets and write sets are 
declared when a transaction starts. 
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The algorithms presented here have not been implemented, and their rela- 
the DrTor°it manC f e t haS DOt beCn examined ^ any detail. Additionally, only 
the priorities of transactions has been to schedule or abort operations 
Other information could be used such ac fh- ,. oir , • • ■ perauons - 

transaction f fin Tf ■ u , Usea ’ such as the remaining running time of a 

for ft 1 ' I . 1Sn C ear what kind of information would be useful 

for the more aggressive schedulers. 

These algorithms were developed as part of the Cornell RR Project where 
which we are developing both theory and tools for building real-time reli 
able systems. Part of this project is the development of a process Zntl 
system, which will eventually contain a database-like component Our 

^ ^ ^ bC " ^ 
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